Inferencing in Information Retrieval
نویسنده
چکیده
We have been addressing the problem of providing access to the free text in biomedical databases. The focus of our work is the development of SPECIALIST, an experimental NLP system for the biomedical domain. The system includes a broad coverage parser supported by a large lexicon, a module that provides acces s to extensive biomedical knowledge sources, and a retrieval module that allows us to carry out experiments in information retrieval. We have recently conducted experiments with a test collection of queries and documents retrieved for those queries. The purpose of the investigation has been to determine the type of information that is required in order to effect a map between the language of queries and the language of relevant documents. 1, I N T R O D U C T I O N Retrieval of information from computerized databases is a complex process whose success depends heavily on the user's knowledge of the structure and logic of the particular database being searched. Many databases have associated with them a controlled indexing vocabulary, or thesaurus, which is the primary access point to the material at search time. For example, the National of Library of Medicine's MESH® thesaurus includes some 16,000 headings that are available for indexing and searching the biomedical literature stored in MEDLINE®, NLM's bibliographic database. The major retrieval strategy is to coordinate MeSH terms with boolean operators, although limited text word searching of titles and abstracts is also possible. Several years ago NLM launched its Unified Medical Language System TM (UMLS TM) project. This is a major research initiative whose goal it is to facilitate retrieval and integration of information from multiple disparate biomedical databases. NLM itself has developed and maintains over 40 databases, and there are many other sources of computerized information in the biomedical sciences. These include factual databases of various kinds, diagnostic expert systems, clinical information systems, as well as bibliographic databases. The UMLS project is attempting to develop methods whereby access is provided to these different systems with their different vocabularies in a way which allows the user to navigate among them with relative ease. Recent results of the project have been the development of an Information Sources Map of biomedical databases, a Metathesaurus TM of biomedical vocabularies and a Semantic Network of high-level biomedical concepts[I,2]. The first release of the Information Sources Map contains a description of the scope, content, and access conditions for approximately fifty biomedical databases. The Metathesaurus includes over 67,000 biomedical concepts from a variety of controlled vocabularies. Definitions, lexical category information, hierarchical contexts, and interrelationships among many of the terms found in its constituent vocabularies are provided. Each concept in the Metathesaurus has been assigned to at least one of the 131 semantic types in the Semantic Network. The Network has top level nodes for organisms, anatomical structures, biologic function and dysfunction, chemicals, events, and concepts. The Network defines these types and establishes a set of 35 potential relationships between them. These include physical, temporal, functional, and conceptual links, e.g., part of, co-occurs with, causes, measures. The Network and the Metathesaurus together form a rich knowledge source of biomedical concepts. The knowledge sources will continue to be augmented and refined based on experimentation in a variety of applications, including our own. Our work is motivated by an interest in the development and testing of natural language processing techniques for improved methods of information retrieval. Document retrieval systems, in particular, are "language-rich" and afford the opportunity to conduct basic research in processing complex natural language text. The focus of our work is the development of SPECIALIST, an experimental NLP system for the biomedical domain[3,4,5]. The system includes a broad coverage parser 1 supported by a large lexicon, a module that accesses the UMLS knowledge sources, and a retrieval module. SPECIALIST runs on Sun workstations and is implemented in Quintus Prolog, with some support modules written in C. We have recently conducted experiments using a test collection of user queries and MEDLINE citation records retrieved for those queries. The data for the test collection were se1During the academic year 1988-1989 we awarded a research contract to the Paoli Research Center of the Unisys Corporation. As a result of this successful collaboration between our two research groups, the syntactic component of the system is extremely robusL See[6,7] for a description of the Paoli system.
منابع مشابه
Using Graphical Approaches for Entity Reconciliation: Attaining Locally and Globally Consistent Entity Annotations
Graphical models provide a very convenient way for representing entity sequences in several problems related to information retrieval, data mining etc. Inferencing in such entity sequences involves making use of not only local information(that is labels depending upon a small window around the entity giving rise to simpler models like chains, trees etc) but also global information(that is label...
متن کاملAdapting a diagnostic problem-solving model to information retrieval
In this paper a competition based connectionist model for diagnostic problem solving is adapted to information retrieval In this model we treat documents as disorders and user information needs as manifestations and a competitive activation mechanism is used which converges to a set of documents that best explain the given user information needs By combining the ideas of Bayesian inferencing an...
متن کاملA formal approach for using granularity in the subject domain of infectious diseases
The aim of the experiment is to put the domainand implementationindependent theory of granularity to the test with the subject domain of human infectious diseases. After determining the data sources, defining the model, and data manipulation operators, the granular perspectives and their levels were defined and contents added. Subsequently, granular information retrieval is tested for cholera a...
متن کاملCAFIIR: An Image Based CBR/IR Application
In this paper we describe a multimedia application called Computer Aided Facial Image Inferencing and Retrieval (CAFIIR) system. This system uses both Case Based Reasoning and Information Retrieval Techniques. In CAFIIR we use fuzzy measures to represent characteristic features of a human face. This paper describes a method designed to implement inferencing using fuzzy measures. It also describ...
متن کاملLearning and inferencing in user ontology for personalized Semantic Web search
User modeling is aimed at capturing the users’ interests in a working domain, which forms the basis of providing personalized information services. In this paper, we present an ontology based user model, called user ontology, for providing personalized information service in the Semantic Web. Different from the existing approaches that only use concepts and taxonomic relations for user modeling...
متن کاملConsidering operational issues for multiagent conceptual inferencing in a distributed information retrieval application
Our system, based on a multiagent framework called collaborative understanding of distributed knowledge (CUDK), is designed with the overall goal of balancing agents’ conceptual learning and task accomplishment. The tradeoff between the two is that while conceptual learning allows an agent to improve its own concept base, it could be counter-productive: conceptual learning is time consuming and...
متن کامل